Outline

This HTML document contains the output of CATS-rb transcriptome assembly comparison tool. For more details on each table and figure, refer to the tool’s documentation.

General transcriptome assembly statistics

Table 1. General transcriptome assembly statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N transcripts 35722 27748 30651 36610 18601 18652 19217
Total transcriptome length (bp) 92747399 34167218 32433743 27462499 44994186 44145594 42094487
GC content (%) 49.14% 49.17% 49.34% 49.68% 48.43% 48.45% 48.60%
N, (%) transcripts with length higher or equal to 200 bp 34155, (95.61%) 23903, 86.14% 25617, 83.58% 28233, 77.12% 18277, 98.26% 18225, 97.71% 18132, 94.35%
N, (%) transcripts with length higher or equal to 500 bp 32238, (90.25%) 14553, 52.45% 13923, 45.42% 11423, 31.2% 16878, 90.74% 16738, 89.74% 16335, 85%
N, (%) transcripts with length higher or equal to 1000 bp 26475, (74.11%) 9887, 35.63% 8846, 28.86% 6486, 17.72% 13488, 72.51% 13356, 71.61% 12875, 67%
N, (%) transcripts with length higher or equal to 5000 bp 4248, (11.89%) 1036, 3.73% 914, 2.98% 608, 1.66% 2003, 10.77% 1931, 10.35% 1766, 9.19%
N, (%) transcripts with length higher or equal to 10000 bp 620, (1.74%) 93, 0.34% 90, 0.29% 59, 0.16% 228, 1.23% 221, 1.18% 202, 1.05%
N, (%) transcripts with length higher or equal to 20000 bp 61, (0.17%) 4, 0.01% 6, 0.02% 6, 0.02% 10, 0.05% 10, 0.05% 6, 0.03%
Mean transcript length (bp) 2596.37 1231.34 1058.16 750.14 2418.91 2366.8 2190.48
Median trancript length (bp) 1879 547.5 434 310 1756 1714 1575
Transcript length IQR (bp) 970-3370 258-1570 234-1238 206-641 926-3168 899-3096 762-2878
Trancript length range (bp) 18-71382 131-60159 131-58349 131-45657 131-50033 131-50033 131-49970
N50 (bp) 3836 2597 2388 1831 3590 3527 3401
L50 5542 3745 3737 3789 3836 3816 3748
N90 (bp) 1345 477 372 249 1215 1194 1135
L90 26715 14985 16939 22581 12212 12182 12096
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N transcripts 18641 18719 19432
Total transcriptome length (bp) 45586004 45065633 43400943
GC content (%) 48.37% 48.39% 48.52%
N, (%) transcripts with length higher or equal to 200 bp 18371, 98.55% 18312, 97.83% 18210, 93.71%
N, (%) transcripts with length higher or equal to 500 bp 17019, 91.3% 16886, 90.21% 16573, 85.29%
N, (%) transcripts with length higher or equal to 1000 bp 13620, 73.06% 13517, 72.21% 13131, 67.57%
N, (%) transcripts with length higher or equal to 5000 bp 2024, 10.86% 1987, 10.61% 1872, 9.63%
N, (%) transcripts with length higher or equal to 10000 bp 234, 1.26% 230, 1.23% 225, 1.16%
N, (%) transcripts with length higher or equal to 20000 bp 10, 0.05% 12, 0.06% 11, 0.06%
Mean transcript length (bp) 2445.47 2407.48 2233.48
Median trancript length (bp) 1778 1749 1605
Transcript length IQR (bp) 944-3211 914-3153.5 776.75-2950
Trancript length range (bp) 131-50248 132-50317 132-50173
N50 (bp) 3602 3576 3485
L50 3876 3847 3794
N90 (bp) 1224 1214 1160
L90 12283 12234 12208

IQR = interquartile range

Figure 1. Transcript length distribution (in bp).

Transcriptome assembly mapping analysis

Table 2. Transcriptome assembly mapping statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N, % unmapped transcripts 499, 1.4% 7, 0.03% 2, 0.01% 4, 0.01% 3, 0.02% 2, 0.01% 4, 0.02%
Transcript alignment proportion (mean, IQR) 1, 1-1 0.98, 1-1 0.98, 1-1 0.97, 1-1 0.99, 1-1 0.99, 1-1 0.99, 1-1
N, % multimapped transcripts 1029, 2.88% 136, 0.73% 216, 0.7% 188, 0.68% 152, 0.82% 153, 0.8% 147, 0.79%
N, % structurally inconsistent transcripts 547, 1.53% 1622, 5.85% 1997, 6.52% 2842, 7.76% 342, 1.84% 385, 2.06% 696, 3.62%
N exons 77321 76835 72591 189645 83914 84838 79302
N exons per transcript (mean, IQR) 5.38, 2-7 2.81, 1-3 2.54, 1-3 2.08, 1-2 4.54, 2-6 4.46, 2-6 4.26, 2-6
Exon length (bp) (mean, IQR) 488.33, 145-555 427.36, 148-478 406.46, 149-450 360.03, 150-388 528.65, 152-612 526.65, 152-610 521.59, 152-603
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N, % unmapped transcripts 0, 0.00% 2, 0.01% 9, 0.05%
Transcript alignment proportion (mean, IQR) 0.99, 1-1 0.99, 1-1 0.99, 1-1
N, % multimapped transcripts 153, 0.82% 162, 0.83% 249, 0.68%
N, % structurally inconsistent transcripts 301, 1.61% 390, 2.08% 511, 2.63%
N exons 81074 82706 84058
N exons per transcript (mean, IQR) 4.57, 2-6 4.52, 2-6 4.37, 2-6
Exon length (bp) (mean, IQR) 531.14, 153-613 529.81, 153-612 526.2, 153-610

IQR = interquartile range

Figure 2. Transcript alignment proportion category distribution.

Figure 3. Number of exons per transcript category distribution.

Figure 4. Exon length distribution (in bp).

Exon set analysis

Table 3. Exon set statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N exon sets 64093 63265 63560 62016 60365 60271 59789
Exon set length (bp) (mean, IQR) 559.41, 156-652 448.88, 153-514 425.07, 153-485 379.64, 153-421 558.64, 160-661 556.2, 160-658 550.61, 159-651
N, % exon sets included in completeness analyses 64093, 100% 63265, 100% 63560, 100% 62016, 100% 60365, 100% 60271, 100% 59789, 100%
N, % unique exon sets 2621, 4.09% 23, 0.04% 19, 0.03% 32, 0.05% 9, 0.01% 7, 0.01% 11, 0.02%
Unique exon set length (bp) (mean, IQR) 148.61, 69-152 537, 187.5-497 339.42, 169-464 276.88, 159-299.25 757.67, 181-521 368.57, 256-537 512.36, 158-476.5
N missing exon sets found in any transcriptome assembly 276 9960 11573 15872 4647 4928 5820
N missing exon sets found in all other transcriptome assemblies 7 814 1323 4937 53 70 196
Common exon set length (bp) (mean, IQR) 678.27, 184-830 532.96, 169-636 500.34, 164-590 428.51, 153-490 655.28, 181-805 651.73, 181-801 641.21, 180-788
Relative common exon set length (mean, IQR) 1, 1-1 0.87, 0.8-1 0.84, 0.71-1 0.77, 0.52-1 0.98, 1-1 0.98, 1-1 0.97, 1-1
Relative exon score 0.995 0.713 0.67 0.57 0.892 0.884 0.862
N missing exon sets inside transcript sets 47 3269 3397 3398 2273 2384 2511
N missing exon sets outside transcript sets 229 6691 8176 12474 2374 2544 3309
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N exon sets 60434 60320 60019
Exon set length (bp) (mean, IQR) 561.05, 161-663 559.62, 160-661 555.66, 160-658
N, % exon sets included in completeness analyses 60434, 100% 60320, 100% 60019, 100%
N, % unique exon sets 8, 0.01% 3, 0% 22, 0.04%
Unique exon set length (bp) (mean, IQR) 1503.38, 355-1313.75 188.67, 112.5-248.5 684.45, 170-529.5
N missing exon sets found in any transcriptome assembly 4463 4693 5286
N missing exon sets found in all other transcriptome assemblies 31 55 110
Common exon set length (bp) (mean, IQR) 658.19, 182-808 655.58, 181-804 648.09, 181-798
Relative common exon set length (mean, IQR) 0.98, 1-1 0.98, 1-1 0.97, 1-1
Relative exon score 0.898 0.892 0.877
N missing exon sets inside transcript sets 2254 2312 2403
N missing exon sets outside transcript sets 2209 2381 2883

IQR = interquartile range

Figure 5. Exon set length distribution (in bp).

Figure 6. Exon set genomic distribution.

Exon set UpSet plot Figure 7. Exon set UpSet plot.

Figure 8. Common exon set length distribution (in bp).

Figure 9. Common exon set relative length category distribution.

Figure 10. Unique exon set length distribution (in bp).

Figure 11. Exon set pairwise completeness similarity.

Pairwise exon set Venn diagrams Figure 12. Pairwise exon set Venn diagrams.

Figure 13. Exon set hierarchical clustering heatmap.

Transcript set analysis

Table 4 Transcript set statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N transcript sets 12056 19948 22201 26626 12286 12441 12824
Transcript set length (bp) (mean, IQR) 7414.43, 1046.75-5999 3766.79, 315-2322 3236.22, 274-1839 2338.12, 227-1059 6935.85, 1079-5587.75 6831.46, 1052-5389 6504.71, 995-5018.5
N isoforms per transcript set (mean, IQR) 2.92, 1-3 1.38, 1-1 1.36, 1-1 1.31, 1-1 1.51, 1-1 1.49, 1-1 1.45, 1-1
N, % transcript sets included in completeness analyses 11951, 99.13% 19913, 99.82% 22152, 99.78% 26417, 99.22% 12278, 99.93% 12436, 99.96% 12810, 99.89%
N, % unique transcript sets 134, 1.12% 9, 0.05% 7, 0.03% 16, 0.06% 3, 0.02% 2, 0.02% 6, 0.05%
Unique transcript set length (bp) (mean, IQR) 835.46, 192-872 1065.56, 184-540 266.86, 166-199.5 219.69, 154.75-257.5 514.33, 506.5-525.5 362, 264-460 460.17, 163.5-502.75
N missing transcript sets found in any transcriptome assembly 113 655 784 1127 336 346 355
N missing transcript sets found in all other transcriptome assemblies 1 64 118 446 5 5 15
Common transcript set length (bp) (mean, IQR) 8443.63, 1394-7128.5 6537.63, 892-5111.25 6130.78, 748-4674 5124.08, 495.75-3694.25 7924.59, 1373.75-6569.5 7884.15, 1373-6488 7677.82, 1344-6240.25
Relative common transcript set length (mean, IQR) 1, 1-1 0.74, 0.55-0.96 0.68, 0.46-0.92 0.53, 0.28-0.8 0.96, 0.97-1 0.95, 0.96-1 0.93, 0.93-1
Relative transcript score 0.986 0.691 0.626 0.484 0.933 0.927 0.906
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N transcript sets 12172 12265 12541
Transcript set length (bp) (mean, IQR) 7041.55, 1093-5692 6975.79, 1082-5590 6740.52, 1049-5276
N isoforms per transcript set (mean, IQR) 1.53, 1-1 1.52, 1-1 1.48, 1-1
N, % transcript sets included in completeness analyses 12167, 99.96% 12262, 99.98% 12531, 99.92%
N, % unique transcript sets 2, 0.02% 2, 0.02% 11, 0.09%
Unique transcript set length (bp) (mean, IQR) 1381, 1128.5-1633.5 248.5, 202.25-294.75 3023.91, 297.5-1128.5
N missing transcript sets found in any transcriptome assembly 331 341 332
N missing transcript sets found in all other transcriptome assemblies 2 3 10
Common transcript set length (bp) (mean, IQR) 7987.33, 1384-6632.5 7954.15, 1375.75-6572.25 7814, 1370-6408.5
Relative common transcript set length (mean, IQR) 0.96, 0.98-1 0.96, 0.97-1 0.94, 0.95-1
Relative transcript score 0.939 0.933 0.921

IQR = interquartile range

Figure 14. Transcript set length distribution (in bp).

Figure 15. Number of isoforms per transcript set category distribution.

Transcript set UpSet plot Figure 16. Transcript set UpSet plot

Figure 17. Common transcript set length distribution (in bp).

Figure 18. Common transcript set relative length category distribution.

Figure 19. Unique transcript set length distribution (in bp).

Figure 20. Transcript set pairwise completeness similarity.

Pairwise transcript set Venn diagrams Figure 21. Pairwise transcript set Venn diagrams.

Figure 22. Transcript set hierarchical clustering heatmap.

Figure 23. Unique exon set position in non-origin transcriptomes.

Figure 24. Missing exon set position.

Annotation-based analysis

Table 5. Annotation-based statistics.

Parameter d_melanogaster_bdgp6 RSP_0.005_1_4 RSP_0.01_1_4 RSP_0.02_1_4 RSP_0.005_5_10 RSP_0.01_5_10 RSP_0.02_5_10
N, % exon sets included in completeness analyses 64093, 100% 63265, 100% 63560, 100% 62016, 100% 60365, 100% 60271, 100% 59789, 100%
N, % matched transcriptome assembly exon sets (exon set precision) 64068, 99.96% 63213, 99.92% 63522, 99.94% 61970, 99.93% 60344, 99.97% 60244, 99.96% 59769, 99.97%
N, % matched GTF exon sets (exon set recall) 64068, 98.78% 50950, 78.56% 48427, 74.67% 41823, 64.48% 59176, 91.24% 58870, 90.77% 57811, 89.14%
Proprtion of covered transcriptome assembly exon sets (mean, IQR) 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1
Annotation-based exon score 0.988 0.707 0.665 0.565 0.885 0.877 0.855
N, % transcript sets included in completeness analyses 11951, 99.13% 19913, 99.82% 22152, 99.78% 26417, 99.22% 12278, 99.93% 12436, 99.96% 12810, 99.89%
N, % matched transcriptome assembly transcript sets (transcript set precision) 11925, 99.78% 19877, 99.82% 22112, 99.82% 26383, 99.87% 12256, 99.82% 12411, 99.8% 12786, 99.81%
N, % matched GTF transcript sets (transcript set recall) 11865, 97.53% 9989, 82.11% 9229, 75.87% 7236, 59.48% 11545, 94.9% 11521, 94.71% 11467, 94.26%
Proportion of covered transcriptome assembly transcript sets (mean, IQR) 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1 1, 1-1
Annotation-based transcript score 0.974 0.677 0.615 0.475 0.917 0.91 0.888
Parameter RSP_0.005_11_20 RSP_0.01_11_20 RSP_0.02_11_20
N, % exon sets included in completeness analyses 60434, 100% 60320, 100% 60019, 100%
N, % matched transcriptome assembly exon sets (exon set precision) 60409, 99.96% 60307, 99.98% 59982, 99.94%
N, % matched GTF exon sets (exon set recall) 59430, 91.63% 59187, 91.26% 58463, 90.14%
Proprtion of covered transcriptome assembly exon sets (mean, IQR) 1, 1-1 1, 1-1 1, 1-1
Annotation-based exon score 0.891 0.885 0.869
N, % transcript sets included in completeness analyses 12167, 99.96% 12262, 99.98% 12531, 99.92%
N, % matched transcriptome assembly transcript sets (transcript set precision) 12147, 99.84% 12244, 99.85% 12496, 99.72%
N, % matched GTF transcript sets (transcript set recall) 11556, 94.99% 11571, 95.12% 11547, 94.92%
Proportion of covered transcriptome assembly transcript sets (mean, IQR) 1, 1-1 1, 1-1 1, 1-1
Annotation-based transcript score 0.922 0.919 0.905

IQR = interquartile range

Figure 25. Proprtion of covered transcriptome exon sets by a GTF exon set category distribution.

Annotation-based exon set UpSet plot Figure 26. Annotation-based exon set UpSet plot.

Annotation-based pairwise exon set Venn diagrams Figure 27. Annotation-based pairwise exon set Venn diagrams.

Figure 28. Annotation-based exon set hierarchical clustering heatmap.

Figure 29. Proportion of covered transcriptome transcript sets by a GTF transcript set category distribution.

Annotation-based transcript set UpSet plot Figure 30. Annotation-based transcript set UpSet plot.

Annotation-based pairwise transcript set Venn diagrams Figure 31. Annotation-based pairwise transcript set Venn diagrams.

Figure 32. Annotation-based transcript set hierarchical clustering heatmap.